Fix MalformedInputException on Windows with Java 8 due to charset mis… #1274

slachiewicz · 2025-10-10T19:32:48Z

…match in stale data cache

Problem

Starting in version 3.11.3, running mvn javadoc:javadoc twice in succession on Windows with Java 8 fails on the second run with:

java.nio.charset.MalformedInputException: Input length = 1
    at org.apache.maven.plugins.javadoc.AbstractJavadocMojo.isUpToDate(AbstractJavadocMojo.java:5008)

This regression affects Windows users running Java 8, where the default platform encoding is Cp1252.

Root Cause

The issue stems from a charset mismatch in how the stale data cache file is written versus how it's read:

Write operation (StaleHelper.writeStaleData(), line 126):
- Uses getDataCharset() which returns Charset.defaultCharset() for Java 8
- On Windows, this is Cp1252
Read operation (AbstractJavadocMojo.isUpToDate(), line 5008):
- Always uses hardcoded StandardCharsets.UTF_8

When the second run attempts to read a file written with Cp1252 encoding using UTF-8, any non-ASCII bytes cause a MalformedInputException.

Solution

Changed StaleHelper to always save in StandardCharsets.UTF_8 instead of platform-dependent charset. This ensures:

Consistent encoding across all platforms (Windows, macOS, Linux)
Consistent encoding across all Java versions (8, 11, 17, 21, etc.)
Write and read operations use the same charset

Fixes: #1273 #1264

…match in stale data cache Changed StaleHelper to always save in UTF-8 instead of platform-dependent default charset. This ensures consistency with AbstractJavadocMojo.isUpToDate() which reads the stale data file using UTF-8. Previously on Windows with Java 8: - First run: file written with Cp1252 (default charset) - Second run: file read with UTF-8, causing MalformedInputException

fridrich · 2025-10-17T07:06:51Z

This basically reverting 33c9f01 which was actually fixing a real problem of inconsistency

gnodet · 2025-10-17T07:17:06Z

This basically reverting 33c9f01 which was actually fixing a real problem of inconsistency

You mean this is reverting https://issues.apache.org/jira/browse/MJAVADOC-614 ?

fridrich · 2025-10-17T07:24:23Z

The d2dd532 brought an inconsistency that I saw with a project containing chinese characters and I made it consistent with 33c9f01

I assume that this bug comes from something related to the way we are determining the charset in that getDataCharset function. It might be needed to make the decision making depending not only on java version but also on os/arch.

I will test this fix. If it is not making the previous bug come back, we can just leave it as it is, but if it regresses, we should look for a proper fix.

gnodet · 2025-10-17T07:28:19Z

The d2dd532 brought an inconsistency that I saw with a project containing chinese characters and I made it consistent with 33c9f01

I assume that this bug comes from something related to the way we are determining the charset in that getDataCharset function. It might be needed to make the decision making depending not only on java version but also on os/arch.

I will test this fix. If it is not making the previous bug come back, we can just leave it as it is, but if it regresses, we should look for a proper fix.

@slachiewicz have a look at https://issues.apache.org/jira/browse/MJAVADOC-614, it mentions the intent for the JDK check, and the fact that the @-files is using UTF-8 on JDK 9..12 and Charset.defaultSet() on others.

fridrich · 2025-10-17T08:52:58Z

Actually, as I look at it more and more and as I understand it better, I think this particular fix is the right one.

gnodet · 2025-10-17T09:02:26Z

Actually, as I look at it more and more and as I understand it better, I think this particular fix is the right one.

But it won't work well on JDK 9, 10 and 11 as I raised above.

fridrich · 2025-10-17T10:44:54Z

I mean, I run the ITs with openjdk 8 11 and 17 and they passed.

My only concern is that the mismatch is in what one gets from plexus-utils as path in string and its encoding and how one reads the line. The thing is that - as I understand it now - it boils down to the filesystem encoding itself for java 8. Because the com.sun.tools.javac.main.CommandLine.loadCmdFile in java 8 does not specify any encoding at all and when I go down the code, it looks like encoding is always assumed (or ignored?).
For java9-12 (verified in 11), the function calls Reader r = Files.newBufferedReader(Paths.get(name)) which assumes UTF-8.INSTANCE. That was changed to Reader r = Files.newBufferedReader(Paths.get(name), Charset.defaultCharset()) for 13+. Now the Charset.defaultCharset(), can it diverge from the filesystem encoding? Maybe and there it would be actually the problem that we have. Were the Charset.defaultCharset() for file content encoding, then that might be a possible mismatch. Like on Linux, you can have java 8 running on a filesystem that is UTF-8

gnodet · 2025-10-17T10:51:46Z

I mean, I run the ITs with openjdk 8 11 and 17 and they passed.

On Windows, where the default encoding is not UTF-8 ?

I think the problem is that javadoc does expect a certain encoding, so we have no choice but to actually use that one. If we always write in UTF-8, I don't see how that would work.

fridrich · 2025-10-17T10:52:12Z

I mean, I run the ITs with openjdk 8 11 and 17 and they passed.

On Windows ?

Indeed, no

fridrich · 2025-10-17T11:00:00Z

I think the problem is that javadoc does expect a certain encoding, so we have no choice but to actually use that one. If we always write in UTF-8, I don't see how that would work.

So, maybe it would be better to cross-check where the original problem lies then. I was having a thought, that if while reading in the try catch block we actually set the charset we used for reading and we write using that one, it could eventually work. But then, I am not sure, because the exception does not necessarily need to be thrown if the path is a valid utf-8, but it is actually in other encoding.

fridrich · 2025-10-17T11:07:21Z

Or, we extract the getDataCharset() somewhere in SystemUtils or so, make it public and use it everywhere where we need to use the charset

fridrich · 2025-10-17T11:12:52Z

Let me craft something.

…rset mis… (apache#1274)" This reverts commit b453602.

fridrich · 2025-10-17T11:33:22Z

Something like #1278

slachiewicz added the bug Something isn't working label Oct 10, 2025

slachiewicz added 2 commits October 10, 2025 23:48

wip

42fe299

slachiewicz force-pushed the windowscharset branch from a49eba0 to 42fe299 Compare October 10, 2025 21:49

slachiewicz added the hacktoberfest-accepted label Oct 14, 2025

slachiewicz merged commit b453602 into master Oct 15, 2025
112 of 131 checks passed

slachiewicz deleted the windowscharset branch October 15, 2025 22:38

github-actions bot added this to the 3.12.0 milestone Oct 15, 2025

github-actions bot assigned slachiewicz Oct 15, 2025

gnodet mentioned this pull request Oct 17, 2025

Support correct charset for @-files #1277

Open

fridrich added a commit to fridrich/maven-javadoc-plugin that referenced this pull request Oct 17, 2025

Revert "Fix MalformedInputException on Windows with Java 8 due to cha…

0af2fde

…rset mis… (apache#1274)" This reverts commit b453602.

fridrich mentioned this pull request Oct 17, 2025

Use consistently the encoding expected by Javadoc for reading and writing of data #1278

Open

8 tasks

Fix MalformedInputException on Windows with Java 8 due to charset mis… #1274

Fix MalformedInputException on Windows with Java 8 due to charset mis… #1274

Conversation

slachiewicz commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Solution

Uh oh!

Uh oh!

fridrich commented Oct 17, 2025

Uh oh!

gnodet commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fridrich commented Oct 17, 2025

Uh oh!

gnodet commented Oct 17, 2025

Uh oh!

fridrich commented Oct 17, 2025

Uh oh!

gnodet commented Oct 17, 2025

Uh oh!

fridrich commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gnodet commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fridrich commented Oct 17, 2025

Uh oh!

fridrich commented Oct 17, 2025

Uh oh!

fridrich commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fridrich commented Oct 17, 2025

Uh oh!

fridrich commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

slachiewicz commented Oct 10, 2025 •

edited

Loading

gnodet commented Oct 17, 2025 •

edited

Loading

fridrich commented Oct 17, 2025 •

edited

Loading

gnodet commented Oct 17, 2025 •

edited

Loading

fridrich commented Oct 17, 2025 •

edited

Loading